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00 ' Abstract 

I This paper presents the solution to the following optimization problem: What is the shape 

^ ' of the two-dimensional region that minimizes the average Lp distance between all pairs of points 

\ if the area of this region is held fixed? [The Lp distance between two points x = {xi,X2) and 

' y = (2/1,2/2) in is — ?/i|^ + \x2 — y2\^)^^^ ■] Variational techniques are used to show that 

. the boundary curve of the optimal region satisfies a nonlinear integral equation. The special 

' case p = 2 is elementary and for this case the integral equation reduces to a differential equation 

. whose solution is a circle. Two nontrivial special cases, p = \ and p = 00, have already been 

' examined in the literature. For these two cases the integral equation reduces to nonlinear second- 

. order differential equations, one of which contains a quadratic nonlinearity and the other a cubic 

Oh' nonlinearity. 

1 ■ 

■ 1 Introduction 

S; 

^ \ A general class of optimization problems is stated as follows: Given a set of n points in a metric 

space and an integer k < n, find a subset of size k that minimizes or maximizes the average distance 
between all pairs of points in the subset. Because this problem is NP-complete [1], it is believed 
5^ I that there is no algorithm that solves this problem in time polynomial in n (or k). 

This class of optimization problems has been widely discussed in the computer science literature. 
One possible application for which the average distance is minimized is in the allocation of jobs to 
nodes in a supercomputer [1-5]. Another possible application is in VLSI layout, where the objective 
is to cluster nearby components of a circuit on a chip [6]. This class of problems is general enough 
to model physical problems, where one seeks the lowest-energy state of a set of particles having 
pairwise attractive or repulsive forces. 

Even special cases have subtle computational-complexity issues. For example, suppose that one 
is given a set of n integer grid points and the objective is to find a set of k points that minimizes the 
average pairwise Manhattan or Euclidean distance between points. It is not known if these special 
cases are NP-complete, and no polynomial-time algorithms have been discovered. Furthermore, 
even if the original n points make up a square region of the grid it is not known if this problem is 
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NP-complctc. This family of problems is so computationally difficTilt that the usual approach in 
the literature has been to find algorithms that produce approximate solutions [7]. 

This paper presents the solution to a continuum version of an optimization problem from this 
class of discrete problems. To be specific, our limited objective here is to identify and to character- 
ize the optimal two-dimensional geometrical shape of a blob that minimizes the average pairwise 
distance between points in the blob. An intuitive way to understand the optimal shape is to view 
think of the blob as a city. By optimal, we mean that the average travel distance between any 
two points in the city is minimized. For the case of a city, a particularly appropriate metric is the 
Manhattan distance, which is the sum of the east-west distance along streets and the north-south 
distance along avenues. It is interesting that while the optimal shape of a city is not circular, it is 
extremely close to circular [8,9]. 

In this paper we consider the general case of Lp norms in for which the Lp distance between 
two points X = {xi,X2) and y = (2/1,1/2) is defined to be 

\\^-y\\p={\x,-y,\P + \x2-y2\^Y^'' (p > 1). (1) 

We use variational methods to determine the shape of the region that minimizes the average Lp 
distance when the area of this region is held fixed. Specifically, we characterize the boundary curve 
h(x) of the optimal region and show that this boundary curve satisfies a nonlinear integral equation. 
This integral equation is new and is the principal result of this paper. 

This paper generalizes two earlier studies. Karp, McKellar, and Wong [8] consider the two 
special extreme cases p = 1 and p = 00. For these cases, they determine a differential equation 
that describes the boundary of the optimal region that minimizes the average distance between all 
points in the region. Bender et al. [9] study the dimensionless average pairwise distance D[h] that 
characterizes the degree of optimality of the region for p = 1 and they use variational methods to 
minimize the value of this functional D[h]. This variational approach leads directly to the nonlinear 
differential equation that describes the boundary curve. In Ref. [9] the numerical value of D[h] is 
also computed. 

Here we extend the variational methods introduced in Ref. [9] to the general case p> 1. For this 
situation the boundary curve satisfies an integral equation. We then examine three special cases: 
For p = 2 (the Euclidean metric) the integral equation reduces to a separable first-order nonlinear 
differential equation, whose solution, as one would expect, is a circle. For p = 1 (the Manhattan 
metric) and for p = 00 (the maximum or Chebychev metric) the integral equation reduces to the 
second-order nonlinear differential equations that are previously derived in Refs. [8,9]. 

This paper is organized as follows: In Sec. 2 we show how to derive the integral equation that 
defines the boundary of the optimal region for general p > 1. Then in Sec. 3 we examine this 
integral equation for the special cases p = 1, p = 2, and p = 00. 

2 Derivation of the Integral Equation 

In this section we first formulate the optimization problem and then perform a variational calcula- 
tion [10, 11] to obtain an integral equation whose solution is the boundary of the optimal region. 

Formulation of the continuum problem 

Let w{x) be the upper boundary of the optimal region. Without loss of generality we assume 
that w{x) is positive. The Lp distance defined in (1) is up-down and left-right symmetric in K^. 
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Thus, the lower bound of the region is —w{x) and w{x) = w(—x). The boundary of the optimal 
region is continuous, so there is a point x = a > at which the curve w{x) crosses the x-axis: 
w{a) = w{—a) = 0. 

Furthermore, the optimal region is symmetric about the 45° lines y = x and y = —x. This 
symmetry allow us to decompose the function w{x) into two functions, w{x) = g{x) below the line 
y = X and w{x) = h{x) above the line y = x: 



W(Xj 



h{x) 



(0 < .X < h), 
{b < X < a), 



(2) 



where b marks the point on the x-axis where the boundary curve w{x) crosses the line y = x. The 
symmetry about the line y = x implies that h{x) is the functional inverse of g{x): h{x) = g~^{x). 

Finally, we define the length scale of this problem by choosing, without loss of generality, to 
work in units such that b =1. We summarize the properties of the functions w{x), h{x), and g{x) 
as follows: 

w{0) = h{0) = a, 

w{l) = hil)=g{l) = l, 

h{x) = g'^ix) (0<a;<l), 

w{x) > {-a<x< a). (3) 
The functions g{x) and h{x) are illustrated in Fig. 1. 




Figure 1: Notation used in this paper. The boundary of the optimal region is symmetric with 

respect to reflections about the x axis, the y axis, and the 45° lines y = zizx. In the upper-half 
plane the boundary curve is w{x) [sec (3)]. The function w{x) crosses the y and x axes at the 
points (0, a) and (±a, 0). The slope of w{x) is at x = and infinite at a; = a. Also, w{x) crosses 
the line y = a; at the point (1, 1), and at this point its slope is —1. The portion of w{x) in the range 
< a; < 1 is h{x) and the portion of w{x) in the range 1 < a; < a is g{x). The functions h{x) and 
g{x) are inverses of one another because w{x) is symmetric about the line y = x. 

The area of the optimal region is given by the functional A[w] whose boundary curve is shown 
in Fig. 1. Note that A[w] is four times the area in the positive quadrant and eight times the area 
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in the positive octant: 

ra 

A[w\ = 4 / dxw{x) , 

Jx=0 



x=0 

A[h] = 8 I dx [h{x) - x] . (4) 

Jx=0 



We use the notation M[w] to represent the integrated sum of the distances between all pairs of 
points in the region: 

/•a /•w{x) ra /•w{u) 

M[w] = dx dy du dv {\x - u\P + \y- v\Pf/P . (5) 

J x=—a Jy=—w{x) Ju=—a J v=—w{u) 

By the symmetry of the boundary function w{x), we may restrict the above integration to the 
points {x,y) in one of the octants: 

fl fh{x) f-a nw{u) 

M[w]=8 dx dy du dv {\x - u\p + \y - v\p)^/p . (6) 

J x=0 Jy=x J u=—a Jv=—w{u) 

We adjust the integrands so that we integrate over positive regions only: 

pl rh{x) pa j"w{u) 

M[w] = 8 dx dy du dv\[\x-u\P + \y-v\P]^^P+[\x-u\P + {y + v)P]^/^ 

Jx=0 Jy=x Ju=0 Jv=0 

+ [{x + uf + \y- v\PfP + [{x + u)P + {y + vff P } . (7) 
Next, we use (3) to replace the function w with h{u) and g{u) to obtain: 

M[w] =8 C dx r^%y C du I [\x - u\P + \y- vf]'''^ + [\x - uf + {y + uff^ 

Jx=0 Jy=x Ju=Q Jv=0 ^ 

+ [(x + uY + \y- vffP + [{x + uf + iy + uffP } 

rl rhix) ra rg{u) 

+8 dx dy du dv\[\x-u\P + \y-v\P]^/P + [\x-u\P + {y + v)P]^/P 

Jx=0 Jy=x Ju=l Jv=0 

+ [{x + u)P + \y- v\P]'/P + [{x + u)P + {y + v)P]'/p } . (8) 
In addition, we make a change of variables in (8) to eliminate all reference to the function g-} 

M[h]=8 dx dy du dv \ [{x + uf + \y - v\PfP + [\x - u\P + \y - v\P]^/P 

Jx=0 Jy=x Ju=0 Jv=0 ^ 

+ [{x + uf + {y + vfl^/P + [\x - u\P + {y + v)P]^/p } 
-8 dx dy ds h'{s) dv{ [{x + h{s)f + \y- v\PfP + [{h{s) - xf + \y- v]^]'''^ 

Jx=Q Jy=x Js=0 Jv=0 

+ [(.T + his)r + iy + vyf/P + [(his) - xf + {y + vf]^'^ } . (9) 
^In this change of variables, u = h{s), g{u) = s, w : 1 — »■ o becomes s : 1 ^ 0. 



Variational calculation 

We must find the function h{x) that minimizes the functional M[h] subject to the constraint that 

the area A[h] is held fixed. Since, the functionals M[h] and A[h] have units of [length]^ and 
[length]^, respectively, a dimensionless measure of the average pairwise distance is given by the 
ratio D[h]: 

The functional D[h] is a dimensionless measure of the optimality of the region. Wc will now use 
variational methods to find the boundary curve h that minimizes the value of D[hi\. 
We begin by calculating the functional derivative of D[h] with respect to h{t): 

5D[h] _ 1 5M[h] 5 M[h] 5A[h] 

'6h(t) ~ (^[/i])5/2 sh{t) ~ 2 {A[h]y/^ dh{t) ■ ^ ' 

To evaluate the right side of this equation we calculate the functional derivative of M[h] in (9): 

^ = 16^ + J\y^ { [{x + tf + \y- h{m''P + [\x - t\P + \y- h{m"P 

+ [{x + ty + {y + h{t)YfP + [\x - t\P + {y + h{t)YfP } 

-16 C dx h'{x) r dy{ [{t + h{x)f + \h{t) - + [{h{x) - tf + \h{t) - yf^/P 

Jx=o ■Jy=o 

+ [{t + h{x)r + {h{t) + yfj'/P + [{h{x) - tf + {h{t) + yff'P } . (12) 

We also calculate the functional derivative of the area A[h] in (4): 

5A[h] 
6h{t) 

The above calculation expresses the functional derivative of D[h] as a double integral. However, 
after lengthy simplifications, we can reduce the double integral to a single integral. Setting the 
result to gives the integral equation satisfied by the function h that minimizes the functional 
D[h]: 

dx { [h'{x) + h'{t)\ {\x - t\P + [h{x) + h{t)Y ) ^'^ 
+ [h!{x) - h'{t)\ {\x - t\P + \h{x) - h{t)\pY'' - [h'{x) + h!{t)\ ({x + tf + \h{x) - h{t)\pY'' 
- [h\x) - h'it)] ((x + tf + [h{x) + h{t)fY' + [1 + h'{x)h'{t)\ ( [h{t) - xf + [h{x) + tfY' 
+ [1 - h'{x)h'{t)\ ( [h{t) + xf + [h{x) + tf ) ^'^ + [h'{x)h!{t) - 1] ( [h{t) - xf + [h{x) - tf ) ^'^ 
-[l + h!{x)h'{t)\ [[h{t) + xf + [h{x) -tf =Q. (13) 
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This equation is complicated, but we can rewrite it in a compact form by changing the integration 
variables and the limits of integration. The result is a nonlinear integral equation satisfied by the 
original function w: 

dx [w'{x) + w'{t)] \{\x - t\P + [w{x) + w{t)Y) - {\x + t\P + \w{x) - w{t)\P) ^/^l = . (14) 

-a '- 

The equivalent integral equations in (13) and (14) are the principal result of this paper. 



3 Special Values of p 

The integral equation (14) reduces to differential equations for three special values of p. Specifically, 
when p = 2 it becomes a first-order nonlinear differential equation, and for the extreme cases when 
p = 1 andp = oo it reduces to the second-order nonlinear differential equations derived in Refs. [8,9]. 
In this section we investigate these three special cases. 



Special case: p — 2 



If we set p = 2 in (14), we can dispense with the absolute- value signs and the integral equation 
now reads 



f dx [w' (x) + w' (t)] \l{x- ty + [w{x) + w{t)f -\J{x + tY + [w{x) - w{t)f 

Jx=—a L 



0. (15) 



It is not immediately obvious, but for this special case we can show that the integrand in this 
integral equation becomes a total derivative if w{x) satisfies the differential equation 



'w'{x)w{x) + x = . 
Indeed, it is easy to verify that if (16) holds, we can rewrite (15) in the form 



r dx^-^ 

Jx=-a dx3w{t) _ 



(x - tf + [w{x) + w{t)f y^'^ + (^{x + tf + [w{x) - w{t)f ) 



3/2' 



(16) 



0. (17) 



Evaluating this integral at the endpoints x = ibo and using the condition that w(^a) = gives 
the value for this integral. This verifies the integral equation (15). Finally, solving (16) for w{x) 
subject to the condition that tL'(ibl) = 1 gives the particular solution 



w 



{t) = V2 - f^. 



(18) 



Thus, the boundary of the optimal region is a circle of radius \/2- 

Let us calculate the value of D[fi\ for this circular region. For simplicity, we calculate M[/i] for 
a circle of radius 1 so that h{x) = ^Jl — x"^ and then divide by (^[/i])^/^ = tt^/^. We calculate D[h\. 



rl i-y/l—x'^ 

D[h] = TT-^I^ I dx I dy 

J x=—l J y=—^/\—x'^ 



du 



u=-l 



v=—\/l—u-^ 



dv \/{x — uY + {y — vY- 



It is best to evaluate this integral in polar coordinates: 



pi /•2tt fl /•2tt 

D[h] = 7r~^/^ / drr dO ds s d(j)\/ (r cos 6 — s cos cp)'^ + (r sin 6 — s sin < 

Jr=0 Je=0 Js=0 J(h=Q 
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Next, wc make the change of variable s = rt and use circular symmetry to evaluate the (f) 
integral. We obtain 

Jr=o Jt=o Je=o 
We interchange orders of integration and perform the r integral exactly to obtain 

ZL f'^ /'2-;r 

D[h] = -7r-3/2 / dtt / deVi + f^-^tcose. 

5 Jt=o Je=o 

The integrals with respect to t and 6 are easy to perform and the exact result is 



1 (-\ 

D[h] = T-TT^^/^ « 0.1915596. 



Special case: p = 1 



We now show that when p = 1, the integral equation (13) reduces to the differential equation that 
was derived by two different methods in Refs. [8] and [9]. We begin by setting p = 1 in (13) and 
simplify the result to obtain 

/ dx(h'{x){\x-t\-x-t)+h'{t)[h{x) + h{t)-\h{x)-h{t)\] - 2xh' {x)h' (t) + 2t) =0. (19) 

Jx=0 ^ ' 

Then, we perform several integrations by parts to obtain the simpler integral equation 

/ dx h{x) - h!{t) + 2h'{t) [ dx h{x) - h'{t) [ dx h{x) + h'{t)h{t)t = . (20) 

Jx=0 Jx=0 Jx=0 

We reduce this integral equation to a differential equation by defining the new function f{t), 
whose derivative is h{t): 

f{t)= f dxh{x). (21) 

Ju=Q 

In terms of f{t) the integral equation (20) becomes 

tnt)f"{t) + [2/(1) - l]f"{t) + fit) - fit) fit) = . (22) 
We must specify the boundary conditions satisfied by fit). By the definition in (21) we have 

/(0) = 0. (23) 

Also from (3) we have 

/(I) = hil) = 1 . (24) 

The solution to the differential equation (22) is uniquely determined by these boundary conditions. 
While an analytical solution is not known, the numerical solution is given in Ref. [9]. Using this 
numerical solution, it is shown that the numerical value of D[h] is 0.650245 952 951. 
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Special case: p = oo 

Assuming that a > and 6 > 0, limp^oo(o^ + b^)^^^ = max(a, b). Thus, for p = oo, (13) becomes: 



/' 

Jx=0 



dx I [h'{x) + h'{t)] max (^\x - t\, h{x) + h{t)j + [h'{x) - h'{t)] max (^\x - t\, \h{x) - h{t)\ ) 

- [h'{x) + h'{t)] max (^x + t, \h{x) - h{t)\ ) - [h'{x) - h'{t)] max (x + t, h{x) + h{t)^ 

+ [l + h'{x)h'{t)\ max {h{t) - x, h{x) + + [l - h'{x)h'{t)\ max [h{t) + x, h{ 



+ 



[/i'(x)/i'(i) - l] max [h{t) - x, h{x) + h'{x)h'{t)\ max [h{t) + x, /i(x) - i j | = . (25) 

The first step in simplifying (25) is to remove the max conditions. Since x and t are between 

and 1, wc know that Ix — 1| < 1 and x + 1 < 2 < h{x) + h{t). Moreover, if x > t, then |x — 1| = x — t 
and \h{x) — h(t)\ = h{t) — h{x). The function x + h[x) is increasing because the derivative of 
this function witli respect to x is positive. [This is because h'{x) is negative but less than 1 in 
absolute value in the region < x < 1; see Fig. 1.] Hence, x + h{x) >t + h{t), which implies that 
X — t > h(t) — h{x) when x > t. A similar argument for the case x <t gives |a: — t| > \h{x) — h{t)\ 
for all X and t in the interval [0, 1]. Also, since \x — t\ > \h{x) — h{t)\, we have x + t> \h{x) — h{t)\. 
We obtain 

dx I [h!{x) + h!{t)\ [h{x) + h{t)] + [h!{x) - h'{t)\ \x - t\ 

- [h'{x) + hOt)] {x + t) - [h!{x) - h!{t)\ [h{x) + h{t)\ 
+ [l + h' {x)h'{t)\ max {h{t) - X, h{x) +t^ + [l- h'{x)h'{t)] max (h{t) + x, h{x) + 

+ [h' {x)h' (t) — l] max (ji{t) - X, h{x) - - [ 1 + /i'(x)/i'(t)] max (^/i(i) + X, /i(x) - } =0. 

Next wc simplify the second half of the equation. We know that x + /i(x) and x — h{x) are 
increasing functions of x. Therefore, the inequality x > t is equivalent to both conditions h{x) — t > 
h{t) — x and x + h{t) > t + h{x) independently. Also, for all x and t it is true that h{x) + t > h{t) — x 
and h{t) + x > h{x) — t. We obtain 

/ dx [h'{x) + h'{t)] [h{x) + h{t)] + [ dx [h'{x) - h'{t)] |x - t\ 

Jx=0 Jx=0 



f 

Jx= 



dx [h'{x) + h'{t)] (x + t)- dx [h'{x) - h'{t)] [h{x) + h{t)] 

x=0 Jx=Q 

+ [ dx [1 + h'{x)h'it)] [h{x) + t]+ [ dx [1- h'{x)h\t)] [h{x) + 1] 

Jx=0 Jx=0 

+ [ dx [1- h\x)h'{t)] [h{t)+x] + [ dx [h'{x)h'(t) - l] [h(t) - x] 

Jx=t Jx=Q 

[ dx [h'{x)h'{t) - 1] [h{x) -t]- [ dx [1 + h'{x)h'{t)] [h{t) + x] | = 0. 

Jx=t Jx=Q ^ 



+ 

Expanding and combining terms, we get 



4h'{t) I dx h{x) - t^h\t) - 2th{t) + 4 / dx h{x) + h'{t) [h{t)f - 2h\t) = , 

Jx=Q Jx=Q 
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and finally using f{t) instead of h{t), we get 

4r(t) [/(I) - /(O)] - t^nt) - 2tf'{t) + 4 [/(t) - /(O)] + f"{t) [fit)] ' - 2f{t) = . (26) 

The solution to this differential equation that satisfies the boundary conditions in (23) and (24) is 
graphed in [8]. 

4 Discussion and Conclusions 

The nonlinear integral equation in (14) defines the exact boundary curve of the optimal blob that 
satisfies the variational problem posed in this paper. While this integral equation is compact enough 
to be displayed on one line, it is only a global characterization of the solution curve w{x). Although 
we have tried hard to do so, we are unable to convert this integral equation to a local (finite-order) 
differential-equation characterization for w{x) except for the three cases p = 1, p = 2, and p = oo. 
It is interesting that these three values of p represent the three most commonly used Lp metrics; 
namely, the Manhattan metric, the Euclidean metric, and the maximum metric. It would be a 
remarkable technical advance if the integral equation could be transformed to simpler equation for 
w{x) for other values of p. 

There are many ways to generalize the continuum problem solved in this paper. One can allow 
the blob to have holes or not to be simply connected. Furthermore, one can try to solve the analog 
of this problem for the case in which the dimension of space is not 2. Finally, one can try to use 
the exact solution of the continuum problem as a step in solving the discrete problem posed in the 
introduction to this paper. The continuum solution provides a good approximation to the solution 
of the discrete problem. The question is whether we can use the continuum solution to construct 
a polynomial-time algorithm for solving the discrete problem. 
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